Skip to content

Conversation

@mre
Copy link
Member

@mre mre commented Feb 6, 2025

Fixes support for relative links.

This is not ready yet. Just putting it here so that I don't forget about it.

Fixes #1574

@mre mre changed the title Relative-link-fixes Fix relative links Feb 6, 2025
@mre
Copy link
Member Author

mre commented Feb 7, 2025

@mre
Copy link
Member Author

mre commented Feb 11, 2025

Fragment handling doesn't work yet, e.g.

[ERROR] file:///Users/foo/bar/file.md%23some-fragment | Cannot find file

but the Markdown file contains the fragment:

### Some Fragment

@mre
Copy link
Member Author

mre commented Feb 11, 2025

Files with spaces don't work either:

[ERROR] file:///Users/foo/bar/My%2520Image.png | Cannot find file

@mre
Copy link
Member Author

mre commented Mar 20, 2025

@mre
Copy link
Member Author

mre commented Aug 28, 2025

@katrinafyi, this is an old branch where I attempted to resolve the confusion around base URL and root path. Needless to say, it didn't go too well. 😆

From the outside, it could sound like relative links have nothing to do with base URLs or root paths, but that's very much the case, actually.

My solution is super hacky and half-baked at the moment.
If you like, could you take a look at my attempt and see if we can cherry-pick some changes into a clean branch which would get us closer to resolve some problems around path handling? Alternatively, let's ditch it and start anew. The new documentation from #1787 might come in handy.

@katrinafyi
Copy link
Contributor

@mre Hmm I see. I'm fairly new here so I'm not quite across what the issues are/were. It seems to be about relative links in the presence of base-url? I see that there is some confusion about the meaning of the arguments, so hopefully #1787 does help reduce the confusion.

That said, in my time working with Lychee, I've formed the opinion that --base-url is inconsistent and best avoided. If you don't specify it, then relative links seem to "just work" as you'd expect, and root-relative links (beginning with /) require --root-dir to be specified, also as you'd expect. Adding --base-url only causes problems because Lychee (currently) tries to make everything relative to that base URL.

Re potential fixes, I can't quite tell what changes this PR tried to make (the changes are many and I'm unfamiliar). I don't know if you saw, but in #1718 (comment), I wrote about what I imagined as a new behaviour for --base-url and --root-dir, with the goal of being detailed enough to implement. You can check this and give feedback as to whether it lines up with your plans :)

Also in the same comment, under the But what about remaps? section, I talk about how you can get a sensible base-url-like behaviour with the current Lychee options. You can test this and see if it works. If it does, it might be good to promote this workaround in the short term - it seems lots of people are affected by this base-url confusion.

Finally, I'd be happy to help but I think this touches a lot of Lychee's internals which I'm not familiar with. Lychee has some kind of async pipeline of handlers to process the links? I assume that implementing #1718 (comment) would need to be done in a certain step of this pipeline. But I'm really not sure.

@mre
Copy link
Member Author

mre commented Aug 28, 2025

True, I remember coming across your proposal in the comment you mentioned and it sounded sensical to me. We can go forward with your approach.

My understanding of base-url still stems from the early lychee days where I always assumed that it should behave similar to the HTML <base> tag: it would only be applicable to remote URLs and never the file system, which is where I mentally always drew the line between base-url and root-dir.
But, of course, nowadays base-url is used very differently. We prepend the base-url to a relative path in the file system to build a remote URL and this is where most of the confusion comes from.

The modern interpretation of base-url (as documented in the new help messages) makes a lot more sense.

At the moment, users get a lot of InvalidBaseJoin warnings as shown here, which is not pretty. My idea was to try and fix that with this PR. I believe the fixtures in this PR could be helpful to document the behavior for the future. Also the tests from https://github.com/lycheeverse/lychee/pull/1624/files#diff-aa1ee21334d5352e4b009dccb9d6c9d58020bd48300963a9a621b4e73c8c7cb3.

Finally, I'd be happy to help but I think this touches a lot of Lychee's internals which I'm not familiar with. Lychee has some kind of async pipeline of handlers to process the links? I assume that implementing #1718 (comment) would need to be done in a certain step of this pipeline. But I'm really not sure.

Yes, that is correct.

This part is here. It's very ugly right now. It's where most of my failed attempts to find a coherent way to harmonize root-dir and base-url took place. This is unmergeable as-is, but could serve as a checklist/blueprint for future work.

The rest of the PR is just tests and moving stuff around, but the core logic is in that section linked above. Guess you could ignore all the async channel handling for now. No rush on my end to get these changes in, but comments/changes are welcome. Maybe we have to break this PR down into smaller pieces which we can merge.

@katrinafyi
Copy link
Contributor

katrinafyi commented Sep 7, 2025

I've had some time to think about this and try some things. I thought I'd check in to make sure we're on the same page. What is the actual problem which we are trying to solve?

I had quick a look through open issues which mention relative links:

So, I am not sure which issues remain, and an implementation of my plan in #1718 (comment) is not really a direct fix to these issues. Rather, it is a fix to the (imo) surprising interaction of --base-url and --root-dir. This is what I want, but is it what you want? :-)

If it is, in my current code (not ready to see the light of day), I have it so that the behaviour should be unchanged if only one of --root-dir or --base-url is specified. The new behaviour only applies if both are given, and it applies the base URL by mapping subpaths of root-dir into subpaths of base-url. If you only specify --base-url, that will still apply indiscriminately and uniformly to all inputs.

I don't know if this is a reasonable way to re-interpret the command-line options, or whether new flags should be added.

I'm also running into lots of terrific issues edge case issues ;-;

Edit: I should add that I think base-url should be changed to require root-dir, and the current base-url behaviour without root-dir could be provided as --fallback-base-url.

Also, in terms of issues that need fixing, I think that https://lychee.cli.rs/recipes/base-url/ is incorrect and misleading. It should be changed to line up with the current behaviour, or it should be rewritten after base-url is changed.

Edit 2: my work can be previewed at rina-forks#3. commentary is welcome.

@mre
Copy link
Member Author

mre commented Sep 9, 2025

What is the actual problem which we are trying to solve?

I want to get rid of message like this:

[WARN] Error creating request: InvalidBaseJoin("...")

In my opinion, we should either fix that root cause or alternative set the logging level to trace.

So, I am not sure which issues remain, and an implementation of my plan in #1718 (comment) is not really a direct fix to these issues. Rather, it is a fix to the (imo) surprising interaction of --base-url and --root-dir. This is what I want, but is it what you want? :-)

To be honest, I'm not 100% sure. Probably if we have a clear distinction between --base-url and --root-dir, my issue will also go away or at least it will be easier to fix. In general, users should never have to deal with InvalidBaseJoin; either we can resolve the path or we can't. If we can't we ignore the link. But I want lychee to do the "right thing" in every scenario where the user specified --base-url, --root-dir or a combination of those. The current error message is very confusing even for me who implemented it. As you said, not very actionable.

I'm also running into lots of terrific issues edge case issues ;-;

Sorry to hear that. But I think it's also one of the hardest problems to crack right now in the entire codebase. So hang in there. 😅

Also, in terms of issues that need fixing, I think that lychee.cli.rs/recipes/base-url is incorrect and misleading. It should be changed to line up with the current behaviour, or it should be rewritten after base-url is changed.

Totally. It's not great. We should completely rewrite it and perhaps add a warning for now.

@katrinafyi
Copy link
Contributor

katrinafyi commented Sep 10, 2025

I see. In that case, I can suggest:

  • (1) Require absolute domain for base-url. This should prevent InvalidBaseJoin when a base-url is specified. It happens when a user gives a base URL that is unsuitable for basing relative URLs.
  • (2) For the case where a user tries to check relative links without giving base-url or root-dir, this should fail and show a clear and actionable message. I think that, in these cases, the InvalidBaseJoin error should be promoted to a real error that's reported as a failed link check. This will allow displaying a suggestion message using the logic that exists for other errors.
  • (3) Also, I think that hiding the message and silently skipping relative URLs would be bad.

These changes would improve the user experience and suggestions would direct them to correct usage of the flags, reducing confusion.

However, even with these changes, I think that the behaviour when using root-dir and base-url together is unavoidably surprising. For doing the "right thing" in that case, that would be the space of #1718 (comment), which we will call (4).

@mre
Copy link
Member Author

mre commented Sep 10, 2025

Thanks for the thoughtful comment. I'm wondering if in this case it makes sense to even continue with this PR or fix it for good using the ideas from the linked comment. I personally don't care too much. If you believe that this PR could be the basis for future work, I can try to brush it up. If not, I'll just go ahead and close it and we work on the proper fix. What do you think? 😀

@katrinafyi
Copy link
Contributor

Idk about this PR. It has lot of changes but it was never clear to me how it would fix relative links. So, my plans don't require the progression of this PR.

That said, maybe there are some useful test cases in here. Also, it might be useful to keep the PR open to show it's being worked on and discussed? At least until we have some other PRs working on this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3094bbc broke fragments checking: InvalidBaseJoin

3 participants